Semi-supervised learning for Machine Translation
نویسندگان
چکیده
Statistical machine translation systems are usually trained on large amounts of bilingual text which is used to learn a translation model, and also large amounts of monolingual text in the target language used to train a language model. In this chapter we explore the use of semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. In particular, in this work we use monolingual source language data from the same domain as the test set (without directly using the test set itself) and use semisupervised methods for model adaptation to the test set domain. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations using French–English and Chinese–English data and show that under some settings translation quality can be improved.
منابع مشابه
A semi-supervised learning approach for morpheme segmentation for an Arabic dialect
We present a semi-supervised learning approach which utilizes a heuristic model for learning morpheme segmentation for Arabic dialects. We evaluate our approach by applying morpheme segmentation to the training data of a statistical machine translation (SMT) system. Experiments show that our approach is less sensitive to the availability of annotated stems than a previous rule-based approach an...
متن کاملGraph-based Learning for Statistical Machine Translation
Current phrase-based statistical machine translation systems process each test sentence in isolation and do not enforce global consistency constraints, even though the test data is often internally consistent with respect to topic or style. We propose a new consistency model for machine translation in the form of a graph-based semi-supervised learning algorithm that exploits similarities betwee...
متن کاملImproved Arabic Dialect Classification with Social Media Data
Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their comb...
متن کاملActive Semi-Supervised Learning for Improving Word Alignment
Word alignment models form an important part of building statistical machine translation systems. Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial alignments acquired from humans. Such dedicated elicitation effort is often expensive and depends on availability of bilingual speakers for the language-pair. In this paper we st...
متن کاملTransductive learning for statistical machine translation
Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and w...
متن کاملLearning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation
In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep autoencoder (DAE) paradigm for phrase-based translation model. Using the unsupervised pre-trained deep belief net (DBN) to initialize DAE’s parameters and using the input original phrase features as a teacher for semi-supervised fine-tunin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008